High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust
نویسندگان
چکیده
The R package bclust is useful for clustering high-dimensional continuous data. The package uses a parametric spike-and-slab Bayesian model to downweight the effect of noise variables and to quantify the importance of each variable in agglomerative clustering. We take advantage of the existence of closed-form marginal distributions to estimate the model hyper-parameters using empirical Bayes, thereby yielding a fully automatic method. We discuss computational problems arising in implementation of the procedure and illustrate the usefulness of the package through examples.
منابع مشابه
bartMachine: Machine Learning with Bayesian Additive Regression Trees
We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and cap...
متن کاملCombining a relaxed EM algorithm with Occam's razor for Bayesian variable selection in high-dimensional regression
We address the problem of Bayesian variable selection for high-dimensional linear regression. We consider a generative model that uses a spike-and-slab-like prior distribution obtained by multiplying a deterministic binary vector, which traduces the sparsity of the problem, with a random Gaussian parameter vector. The originality of the work is to consider inference through relaxing the model a...
متن کاملSparse Bayesian hierarchical modeling of high-dimensional clustering problems
Clustering is one of the most widely used procedures in the analysis of microarray data, for example with the goal of discovering cancer subtypes based on observed heterogeneity of genetic marks between different tissues. It is wellknown that in such high-dimensional settings, the existence of many noise variables can overwhelm the few signals embedded in the high-dimensional space. We propose ...
متن کاملBayesian Variable Selection in Clustering High-Dimensional Data With Substructure
In this article we focus on clustering techniques recently proposed for highdimensional data that incorporate variable selection and extend them to the modeling of data with a known substructure, such as the structure imposed by an experimental design. Our method essentially approximates the within-group covariance by facilitating clustering without disrupting the groups defined by the experime...
متن کاملBANFF: An R Package for BAyesian Network Feature Finder
Feature selection on high-dimensional networks plays an important role in understanding of biological mechanisms and disease pathologies. It has a broad range of applications. Recently, a Bayesian nonparametric mixture model (Zhao, Kang, and Yu 2014) has been successfully applied for selecting gene and gene sub-networks. We extend this method to a unified approach for feature selection on gener...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012